Add insertion methods with fallible allocation to cranelift_bforest::{Map,Set}#12685
Add insertion methods with fallible allocation to cranelift_bforest::{Map,Set}#12685fitzgen wants to merge 1 commit intobytecodealliance:mainfrom
cranelift_bforest::{Map,Set}#12685Conversation
|
Before going too too far down this route (this PR is fine in isolation though), could you detail a bit more the end state here? I'm assuming this is eventually going to make its way into the So overall I'm not quite sure I see the vision of where this is going to end up, hence the question -- could you detail a bit more about how this is going to make its way into Wasmtime? We have a surprisingly large number of |
Subscribe to Label Actioncc @fitzgen DetailsThis issue or pull request has been labeled: "cranelift", "fuzzing"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
Yeah, we do use My plan is to create a #[derive(...)]
struct BTreeMapValueIndex(u32);
entity_impl!(BTreeMapValueIndex);
pub struct BTreeMap<K, V>
where
K: Copy,
{
values: PrimaryMap<BTreeMapValueIndex, V>,
forest: cranelift_bforest::MapForest<K, BTreeMapValueIndex>,
map: cranelift_bforest::Map<K, BTreeMapValueIndex>,
}
impl<K, V> BTreeMap<K, V>
where
K: Copy,
{
pub fn get(&self, key: K) -> Option<&V>
where
K: Ord,
{
let idx = self.map.get(key, &self.forest, &()?;
Some(&self.values[idx])
}
// etc...
}I haven't done a full audit, but everything I've encountered so far has If you are concerned about the performance implications this extra indirection has, we can put it behind a cargo feature. When the feature is not enabled, we would wrap |
|
Ah ok that makes sense yeah. And to confirm, I'm not familiar with the implementation of bforest, but given the documented caveats you're confident that this is a suitable replacement for our usages of Also, that design may be a bit tricky insofar as it won't easily support removal without further maps/etc for tracking that. I don't think we need to remove from Perf-wise I'm not familiar enough with bforest to have much of an opinion on that. I don't think we need a hyper-optimal data structure here and just something that isn't O(n) in the worst case. I do think this is a reasonable enough time to rethink our data structures, though, perhaps? For example It's been a bit though since I looked at |
Those documented caveats seem workable, AFAICT. I haven't investigated exact size differences, but I think it should be within a constant factor of the same size.
I guess my initial sketch had pub fn remove(&mut self, key: K) -> Option<V> {
let id = self.map.remove(key, &mut self.forest, &())?;
Some(self.values.dealloc(id))
}
Generally a fan of re-thinking data structures where it makes sense as we go through and touch various things. However, in the particular case of enum ModuleRegistryData {
Few(Vec<Module>),
Many(BTreeMap<RegisteredModuleId, Module>),
}I'm fairly meh. We need a What might be worth it would be something like this tho: enum ModuleRegistryData {
One(Module),
Many(BTreeMap<RegisteredModuleId, Module>),
}This would let us avoid any allocation at all in the extremely common case of a single module in the registry. |
alexcrichton
left a comment
There was a problem hiding this comment.
Well, in any case, this is of course fine as-is, and probably best to keep discussing in further PRs/issues/etc.
I think it should be within a constant factor of the same size
Mind double-checking? I'd basically want to make sure the constant isn't so big to be egregious. Store<T> is probably already way too big as-is...
we probably actually want values:
Slab<V>
makes sense yeah
and doing something like ... I'm fairly meh
I agree yeah, I don't think we've crossed the complexity threshold to those just yet, and I agree it doesn't remove the need for BTreeMap, so it's probably fine to defer that to some future optimizations
Along the lines of rethinking data structures, one example is that within ModuleRegistry I don't think that either of ModuleRegistry::{store_code, modules} need to be a BTreeMap. Those look like they can both be hash maps. ModuleRegistry::loaded_code is kind of the most fundamental one, but we can probably get away with removing LoadedCode::modules. The latter needs range queries but we can probably switch that to a sorted vector since it's not dynamically added to (it's a per-component thing so we could precompute it at component-load-time or something like that).
The GlobalRegistry is also pretty fundamentally a BTreeMap, but overall I think we'll only need two in the long-run?
Agreed, but one caveat I should mention is that its iteration order is exposed via the debug APIs ( |
|
Good point yeah, that'd be a better fit for |
No description provided.